Preparations

https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.1365-2656.2008.01390.x

Load the necessary libraries

library(gbm)         #for gradient boosted models
library(car)
library(dismo)
library(pdp)
library(ggfortify)
library(randomForest)
library(tidyverse)
library(gridExtra)
library(patchwork)

Scenario

Abalone are an important aquaculture shell fish that are farmed for both their meat and their shells. Abalone can live up to 50 years, although their longevity is known to be influenced by a range of environmental factors. Traditionally, abalone are aged by counting thier growth rings, however, this method is very laborious and expensive. Hence a study was conducted in which abalone growth ring counts were matched up with a range of other more easily measured physical characteristics (such as shell dimesions and weights) in order to see if any of these other parameters could be used as proxies for the number of growth rings (or age).

abalone

Format of abalone.csv data file

Read in the data

abalone = read_csv('../public/data/abalone.csv', trim_ws=TRUE)
glimpse(abalone)
## Rows: 4,177
## Columns: 10
## $ SEX          <chr> "M", "M", "F", "M", "I", "I", "F", "F", "M", "F", "F", "M…
## $ LENGTH       <dbl> 0.455, 0.350, 0.530, 0.440, 0.330, 0.425, 0.530, 0.545, 0…
## $ DIAMETER     <dbl> 0.365, 0.265, 0.420, 0.365, 0.255, 0.300, 0.415, 0.425, 0…
## $ HEIGHT       <dbl> 0.095, 0.090, 0.135, 0.125, 0.080, 0.095, 0.150, 0.125, 0…
## $ WHOLE_WEIGHT <dbl> 0.5140, 0.2255, 0.6770, 0.5160, 0.2050, 0.3515, 0.7775, 0…
## $ MEAT_WEIGHT  <dbl> 0.2245, 0.0995, 0.2565, 0.2155, 0.0895, 0.1410, 0.2370, 0…
## $ GUT_WEIGHT   <dbl> 0.1010, 0.0485, 0.1415, 0.1140, 0.0395, 0.0775, 0.1415, 0…
## $ SHELL_WEIGHT <dbl> 0.150, 0.070, 0.210, 0.155, 0.055, 0.120, 0.330, 0.260, 0…
## $ RINGS        <dbl> 15, 7, 9, 10, 7, 8, 20, 16, 9, 19, 14, 10, 11, 10, 10, 12…
## $ AGE          <dbl> 16.5, 8.5, 10.5, 11.5, 8.5, 9.5, 21.5, 17.5, 10.5, 20.5, …
abalone = abalone %>% mutate(SEX=factor(SEX))

Exploratory data analysis

Fit the model

Explore relative influence

Explore partial effects

Explore accuracy

Explore interactions

Tuning

Random Forest

library(randomForest)
abalone.rf = randomForest(RINGS ~ SEX + LENGTH + DIAMETER + HEIGHT +
                      WHOLE_WEIGHT + MEAT_WEIGHT + GUT_WEIGHT + SHELL_WEIGHT,
                      data=abalone, importance=TRUE,
                      ntree=1000)
abalone.imp = randomForest::importance(abalone.rf)
## Rank by either:
## *MSE (mean decrease in accuracy)
## For each tree, calculate OOB prediction error.
## This also done after permuting predictors.
## Then average diff of prediction errors for each tree
## *NodePurity (mean decrease in node impurity)
## Measure of the total decline of impurity due to each
## predictor averaged over trees
100*abalone.imp/sum(abalone.imp)
##                 %IncMSE IncNodePurity
## SEX          0.12854075      2.955944
## LENGTH       0.05797647      8.461982
## DIAMETER     0.06420550     10.355279
## HEIGHT       0.08923043     13.033870
## WHOLE_WEIGHT 0.06532046     14.727181
## MEAT_WEIGHT  0.13599780     13.665770
## GUT_WEIGHT   0.07368945     12.105419
## SHELL_WEIGHT 0.10624192     23.973352
varImpPlot(abalone.rf)

## use brute force
abalone.rf %>% pdp::partial('SHELL_WEIGHT') %>% autoplot